Introduction: Cytomorphology and differential count (DC) of peripheral blood (pB) are the first methods to be applied in hematology. Smears are assessed after MGG staining and technicians and hematologists count 100-200 cells. The method itself is time consuming and not fully reproducible. We now report on final results of our prospective BELUGA clinical trial (NCT04466059) using an AI driven cloud-based platform to provide DC at higher reproducibility and shorter turnaround times than the manual approach.

Aim: Test the value of AI driven differential counts in a specialized hematology laboratory by comparison to humans and create a cloud based web platform.

Methods: We used a completely automated microscope (MetaSystems, Altlussheim, Germany) (100 X without oil and 400 X with oil) and captured ~500 cells as single cell images (capture time per smear: 4:30 min). We prospectively enrolled 29,119 routine cases sent to us between 1/2021 and 7/2022 for DC and processed these in parallel for AI classification. All cases have been differentiated by technicians and hematologists (4 eyes principle, ISO 15189 and CAP accredited). 21 different cell classes of benign and malignant cells were discriminated by a supervised Machine learning model in its 5th iteration trained on 98,863 carefully annotated balanced cell images using Amazon Sagemaker in cloud environment. For model training we targeted for at least 1000 cell images/class.

Results: For our manual DC in median 100 cells/sample (range 82-130) were evaluated, totaling 2,911,915 cells. Next, in median 500 cell images/sample (range 75 - 500) were gathered through the automated scanner (overall 14,322,972) and subsequently uploaded to our cloud based web application for prediction. For each image a probability score for each of the 21 classes was given.

The emphasis of the study was put on the ability of the AI to recall all pathological cells. Blasts (2 - 99 cells) were identified in 2040 cases manually (M), while the AI identified 1921 (94%, 1 - 302 cells). Neoplastic lymphocytes (2 - 93 cells) were identified in 1781 cases manually, while 1756 (99%, 1 - 287 cells) were also identified by AI. Atypical promyelocytes in pB in APL were identified in all cases by AI (n=23 cases, 100%). In hairy cell (HZ) leukemia (n=156 cases, diagnosed by flow cytometry) 70% were detected both by M and AI. In 18% M did only see <2% HZ after being informed about flow results. In 12% only AI detected HZ, showing superiority against the routine workflow. Of note, manual data was supported in all cases by immunophenotyping pointing to pathological cells. To this comprehensive workflow the AI tool had no access. Only 0.003% (181/66721) of cells in the blast category were misclassified by AI. Of those, 143/181 (79%) were classified as another pathological cell type not altering the clinical diagnosis.

With respect to segmented neutrophils (seg) we saw 54% M vs. 44% AI, with respect to band neutrophils (band) 1% M vs. 7% AI. We observed no differences in the following classes: eosinophils (M: 2.3% vs. AI: 2.45%), basophils (M: 0.76% vs. AI: 0.90%), monocytes (M: 6.96% vs. AI: 7.05%).

For 318 cases we manually reviewed the captured cell images/classes, which had been classified by AI before. In 88% (66,721/90,773) of the images no changes were made to classification of AI algorithm. From the 12% (8303/66,721) reclassified cell images 56% (4645) were lymphocytes (normal) but misclassified as lymphocytes (neoplastic). The remaining misclassification were among mostly benign classes (905 band instead of seg; 834 seg instead of band; 793 metamyelocyte to a variety of other next neighbor class). All this had no impact on the decision between normal and pathological cases.

Conclusions: A head-to-head comparison between human manual and AI based review of pB differential count was performed in a prospective, blinded study with >29,000 cases. AI classifier results were comparable to routine methods with the added benefit of total reproducibility, regardless of the experience of a single diagnostician. Additionally, the automated upload and computation time for 500 images is <30 sec and multiple samples can run in parallel, thus reducing the TAT. Manual overread of AI results is possible through a simple web application interface, system can flag critical cases for priority review. Thus, guidance of additional required diagnostic methods can start earlier and more directed.

Haferlach:Munich Leukemia Laboratory: Current Employment, Other: Part ownership. Nadarajah:MLL Munich Leukemia Laboratory: Current Employment. Haferlach:MLL Munich Leukemia Laboratory: Current Employment, Other: Ownership. Kern:MLL Munich Leukemia Laboratory: Current Employment, Other: Ownership. Pohlkamp:MLL Munich Leukemia Laboratory: Current Employment.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution